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ABSTRACT 


A closed-form,  integration-based,  and  massively-parallel  algorithm  for 
determining  depth  of  points  in  3-D  using  one  moving  camera  is  presented.  It  is 
based  on  analyzing  a sequence  of  images  that  result  from  a known  rectilinear 
motion  of  a camera  (with  no  rotation)  in  a stationary  environment.  A 
traceable  point  in  an  image  sequence  is  reconstructed  using  an  integration 
operation  (no  differentiation  operator  is  involved). 

The  method  arose  from  two  simple  observations: 

(1)  Stationary  points  in  the  3-D  scene  appear  to  move  away  from  the 
focus  of  expansion  (FOE). 

(2)  The  distance  of  a point  in  3-D  space  from  the  camera  motion-axis  is 
the  same  at  all  instants  of  time. 

Any  visible  moving  point  in  the  image  can  be  processed  independently  of, 
and  concurrently  with,  any  other  point.  Laboratory  results  for  the  case  where 
the  optical  axis  is  parallel  to  the  motion  axis  show  an  error  of  less  than  0.6%  in 
absolute  distance. 


1.  INTRODUCTION 

This  paper  presents  a new,  robust,  integration-based,  and  massively 
parallel  algorithm  for  determining  depth  of  points  in  3-D  using  one  moving 
camera.  It  is  based  on  analyzing  a sequence  of  images  that  result  from  a 
known  rectilinear  motion  of  a camera  (with  no  rotation).  A traceable  point  in 
the  image  sequence  is  reconstructed  using  an  integration  operation. 

The  method  arose  from  two  simple  observations: 

(1)  Stationary  points  in  the  3-D  scene  appear  to  move  away  from  the 
focus  of  expansion  (FOE)  (or  toward  the  focus  of  contraction). 


(2)  The  distance  of  a point  in  3-D  space  from  the  camera  motion  axis  is 
the  same  at  all  instants  of  time. 

Observation  (1)  provides  the  concurrent  processing  property  since  each 
image  radial  line  can  be  processed  independently.  Using  the  rather  than 
the  X-Y  image-plane  coordinate  system,  the  algorithm  becomes  very  simple 
(constant  <J)  corresponds  to  a radial  line  that  emerges  from  the  FOE;  constant 
0 corresponds  to  a circle  in  the  image  plane  whose  center  is  the  FOE). 
Tracking  a point  in  the  image  sequence  is  directional  (i.e.,  along  constant  ^ 
line)  and  thus  computationally  inexpensive.  In  fact,  each  point  on  a constant 
$ line  can  be  processed  on  a separate  processor.  Assuming  that  0 and  d0/dt 
of  a point  in  the  image  plane  are  known  as  well  as  the  speed  of  the  camera, 
then  the  location  of  the  corresponding  point  in  space  can  be  explicitly 
calculated.  In  the  proposed  method  d0/dt  is  not  measured,  but  indirectly 
calculated  using  an  integration  operation.  One  of  the  major  advantages  of  this 
technique  is  that  the  integration  operator  "smoothes  out”  errors  caused  by 
unfocused  camera,  camera’s  noise,  (x-y)  to  (0-<l))  image  conversion  error, 
inexact  edge  location,  non  . ideal  motion  of  the  camera,  etc.).  Also,  the  method 
is  more  appropriate  for  structured  environments. 

Using  this  method  the  distance  of  a traceable  point  from  the  axis  of 
motion  of  the  camera  can  be  obtained.  Given  this  distance  and  the  location  of 
the  point  in  the  image  plane,  the  location  of  the  point  in  3-D  in  camera 
coordinates  can  be  explicitly  found. 

In  a set  of  experiments  we  obtained  the  distance  of  a point  from  the 
camera’s  axis-of-motion  (which  in  our  case,  is  also  the  optical  axis  of  the 
camera).  An  accuracy  of  99.4%  has  been  achieved  from  a sequence  of  40 
images.  We  anticipate  significantly  smaller  error  when  the  camera’s  optical 
axis  is  perpendiculr  to  the  motion  axis. 
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Related  work  deals  with  depth  estimation  from  two  consecutive  images, 
[3-6]  from  a larger  sequence  of  images  [1,2,7-9,12],  using  mainly  optical  flow 
[3-9,12,20,21],  temporal  and  spatial  brightness  gradients  [1,10,11,18], 
correlation-based  methods  [20]  and  epipolar  analysis  [2]. 

2.  THE  0-4)  DOMAIN 

During  a rectilinear  motion  (with  no  rotation),  points  in  the  image  plane 
move  away  from  the  FOE  (Figure  1).  Based  on  this  observation  we  use  for  our 
method  an  angular  0-^)  (rather  than  X-Y)  image  plane.  Figures  2 and  3 show 
the  chosen  coordinate  system  and  the  definition  of  the  angles  0 and  (J)  : 
Constant  $ corresponds  to  a radial  line  that  emerges  from  the  FOE,  and 
constant  6 corresponds  to  a circle  whose  center  is  the  FOE.  Clearly,  a point 
(x,y)  in  the  (X-Y)  plane  can  be  transformed  to  a ( 0-(})  ) point  in  the  (0  - <1>) 
plane  and  vice  versa. 

We  assume  that  the  camera  moves  in  a known  rectilinear  motion.  Using 
the  0 - <|)  coordinate  system,  any  point  in  the  image  plane  moves  along  a 
constant  (J)  line.  This  fact  provides  concurrent  processing  capability  (each 
constant  can  be  processed  separately  and  independently  of  any  other  $ 

line).  Also  each  point  on  a constant  (J)  line  can  be  processed  independently  of 
other  points  that  lie  on  the  same  line.  As  will  be  shown  later,  the  expression 
for  the  distance  of  a point  in  space  from  the  camera  pinhole  is  independent  of 
4).  A constant  (J)  line  can  be  processed  as  a ID  image.  Figure  4 shows  an 
example  of  the  time  evolution  of  a 1-D  image. 

3.  THE  3-D  RECONSTRUCTION  ALGORITHM 

Using  the  0 - ^ coordinate  system  we  show  a simple  way  to  find  the  3-D 
location  of  a point  in  space.  Along  a constant  $ line,  for  small  enough  changes 
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in  time  and  continuous  speed  V(t)  of  the  camera,  the  following  calculations 
hold  for  the  point  P (Figure  5).  Assume  a pinhole  model  of  the  camera,  then: 


Af  = V(t)Atsin  6(t) 

(1) 

A€  = Rit)tan  A0(/) 

(2) 

From  (1)  and  (2)  and  for  At -►O  we  get  (note  that  tan  A0(t)==' A0(t)  for  A0(t)->-O): 

V^_ 

RU)~ 

dm) 

dt 

sinm) 

(3) 

With  a known  V(t) 

and  measured 

0(t)  and  d0(t)/dt,  R(t) 

can  be  calculated 

from  Equation  (3). 

Now  let  us  consider  Figure  6.  During  rectilinear  motion 

the  distance  d is  constant  at  all  times.  By  substituting 

in  (3)  we  get 

R{t)  = 

d 

sinm) 

(4) 

or 

m _ 

d 

dmydt 

sin^it) 

(5) 

dm)  vit)  . 2^,, 

= Siam) 

dt  d 

(6) 

Integrating  d0(t)/dt  with  respect  to  time  yields: 

and  thus 

e«2)-9a.)=i 

<2 

V{t)  sin^G(t)  dt 

h 

(7) 

■<2 

V{t)  siri^m)  dt 

(8) 

m^)-m^) 
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For  a moving  camera  the  expression  for  d in  equation  (8)  is  based  on 
integration  and  is  the  same  for  all  ti,  t2(ti=tt2).  Given  d,  6,  and  (}),  the  3-D 
location  of  a point  can  be  explicitly  calculated.  By  combining  Equations  (6) 
and  (8)  an  integration-based  expression  for  d0(t)/dt  is  obtained: 


d0(^)  o 

= V’(^)  sinQit) 

dt 


eap-eUj) 

2 9 

VU)sinQ{t)dt 


Note  that  the  ratio 

eu2)-0(^i) 


V(^)  sm^GU)  dt 


(9) 


(10) 


in  Equations  (8)  and  (9)  can  be  computed  independently  of  the  time  instant  for 
which  the  value  of  d0(t)/dt  is  desired. 

For  the  special  case  where  the  speed  of  the  camera  is  constant,  i.e.,  V(t)  = V 


and 


d=V 


2 9 

dt 


e(/2)-e(^i) 


de  it) 
dt 


= sin^S(t) 


e(<2)-e(^i) 

<2 

sin^Q(t)  dt 


(11) 


(12) 


The  latter  result  is  independent  of  V. 

The  relations  (8)  and  (9)  can  also  be  expressed  as  time  independent 
expressions.  By  substituting  dx  = V(t)  dt  in  equations  (8)  and  (9)  we  get: 


^2  2 

sinQ(x)  dx 


e(x2)-0(a:j) 


(13) 
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and 


dd  ix)  sin^Qix) 
dx  d 


(14) 


where  x is  the  location  of  the  pinhole  point  along  the  path  of  motion,  and  e(x) 
is  the  angle  6 of  the  point  p at  location  x. 


4.  EXPERIMENTAL  RESULTS 

A set  of  experiments  have  been  conducted  to  test  the  proposed  method.  First 
we  describe  the  environment,  then  explain  the  experiments  and  finally  show 
the  results. 

4.1.  Set  Up  (Refer  to  Figure  7) 

A CCD  video  camera  is  attached  to  an  IBM  7565  gantry  robot  which  has 
six  degrees  of  freedom.  The  optical  axis  of  the  camera  and  its  direction  of 
motion  coincide.  The  camera,  with  a field  of  view  of  about  ± 12°  was  manually 
focused  on  the  object  at  its  initial  position  only.  It  moves  in  such  a way  that  its 
distance  from  the  object  varies  from  1125.2  mm  to  299.7  mm.  Dark  objects 
placed  on  a white  background  are  used  in  this  experiment  for  better  contrast. 
The  images  are  processed  on  a PC-based  vision  system  and  the  relevant  image 
parameters  (in  our  case,  the  edges)  are  extracted  to  subpixel  accuracy.  Since 
the  algorithm  is  parallel  and  each  ^ line  can  be  analyzed  independently  we 
chose  for  this  experiment  (J)  = 0 and  (J)  = n lines  only  (Figure  3).  Similar 
processing  could  be  applied  to  other  values  of  the  angle  (J)  as  well. 

4.2.  Finding  Edges  in  an  Image 

Due  to  image  digitization,  focusing  problems,  etc.  edges  in  the  image  are 
not  sharp.  For  example,  a 1-D  image  may  look  like  Fig.  8a. 

Certain  cacnmercial  equipment,  instruments  or  materials  are  identified  in  this  paper  in  order  to  adequately  specify  the  experimental 
procedure.  Such  identification  docs  not  imply  recommendation  or  endorsement  by  the  National  Institute  of  Standards  and 
Technology,  nor  does  it  imply  that  the  materials  or  equipment  identified  are  necessarily  the  best  available  for  the  purpose. 
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For  the  pixels  near  an  estimated  edge  we  used  polynomial  approximation 
and  a brightness  threshold  to  estimate  the  edge  location.  Refer  to  Fig.  8b. 

4.3.  Moving  Along  the  Optical  Axis 

In  order  to  assure  that  the  axis  of  motion  coincide  with  the  optical  axis,  the 
robot  was  moved  toward  and  away  from  a circular  shaped  dark  object  along  the 
X axis  (Figure  7).  At  each  position  we  detected  edges  of  a dark  circle  relative 
to  the  center  of  the  image.  The  other  five  degrees  of  freedom  of  the  robot  were 
used  to  change  the  position  and  orientation  of  the  camera  such  that,  finally,  at 
all  positions  of  the  camera  along  the  motion  axis,  the  circle  appeared  in  the 
middle  of  the  image. 

4.4.  Pixel  Location  Conversion 

For  the  algorithm  to  work  we  converted  (x-y)  to  (0-(}))  at  each  pixel.  As 
mentioned  earlier,  we  used  ({>  = 0 and  $ = n radial  lines.  The  conversion 
resulted  in  a look-up  table  which  is  pictorially  shown  in  Figure  9.  dpixel  is  the 
pixel  number  from  the  central  pixel. 

4.5.  Measurement  of  Distances 

We  showed  our  system  a new  object  and  computed  distances  from  the  axis 
of  motion  to  a point  on  the  object.  In  this  experiment  the  optical  axis  is  parallel 
to  the  direction  of  motion.  Forty  images  were  taken  every  0.635  mm  and 
processed.  The  initial  range  from  the  focal  point  to  the  object  was  972.8  mm. 
We  used  (13)  to  estimated  the  distance  ”d”.  The  following  table  summarizes 
the  results.  For  integration  purposes  we  assumed  first  order  polynomials 
between  two  consecutive  points,  i.e.  0(x)  between  two  consecutive 
measurements  of  0 estimated  to  be  on  a straight  line  connecting  the  points 
(Xi,  .0(xi))  and  (Xi  + i,0(Xi  + i)). 

In  Table  1 we  show  the  measurements  errors.  As  mentioned  earlier,  a 
total  of  40  measurements  were  taken  when  grouping  the  measurement  into  36 
sets  of  5 images  each,  that  is 
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images  1,2, 3, 4, 5 

images  2,3, 4, 5, 6 

and  images  36,37,38,39,40, 

and  computing  the  distance  ”d”  from  (13)  for  each  set,  there  were  different 
results  for  ”d”.  When  averaging  the  results  obtained  from  the  36  sets,  the 
average  error  was  2.08%.  The  average  of  absolute  error  of  ”d”  was  12.82%. 
One  of  the  36  sets  resulted  in  0.13%  error. 

Similar  computations  were  done  for  10,20,30,38,  and  40  images  in  each 
set.  Note  that  for  a set  of  40  images  the  error  was  0.56%. 


TABLE  1:  RESULTS 


# of 
images 

Camera 
location 
increment 
in  mm 

#of 

images' 

sets 

Average 
error  of  d 
in  % 

Average 

of 

absolute 
error  of  d 
in  % 

Minimum 
absolute 
error  of  d 

In  % 

5 

0.635 

36 

2.08 

12.82 

0.13 

10 

0.635 

31 

0.71 

8.77 

0.62 

20 

0.635 

21 

3.38 

4.97 

0.01 

30 

0.635 

11 

2.67 

2.67 

0.18 

38 

0.635 

3 

0.32 

0.60 

0.42 

40 

0.635 

1 

0.56 

0.56 

0.56 

5.  CONCLUSIONS 

An  algorithm  for  depth  estimation  of  a traceable  point  has  been 
presented.  In  0-^>  coordinates  all  traceable  points  in  the  image  can  be 
concurrently  reconstructed.  Due  to  the  integration  process  the  points  whose  3- 
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D locations  are  desired  do  not  have  to  be  reliably  traced.  Errors  caused  by  an 
unfocused  camera,  camera  noise,  (X-Y)  to  (6-(}))  conversion,  non-ideal  motion 
of  the  camera,  etc.  are  "smoothed”  by  the  reconstruction  algorithm.  The 
method  may  also  work  for  object  features  such  as  centroids  of  2-D  objects. 
Centroids  can  be  more  reliably  traced  than  visible  feature  points.  The  location 
of  the  FOE  is  assumed  to  be  known.  In  many  practical  cases  this  is  not  the 
case.  However,  the  FOE  can  be  obtained  using  methods  as  described  by  Jain 
[16],  Nagahdaripour  et.  al.  [17],  Vitoria  Lobo  at  el.  [19],  or  by  using  an  inertial 
navigation  system.  An  average  absolute  error  in  distance  measurements  of 
0.56%  for  40  images  has  been  obtained  in  the  case  of  a camera  optical  axis 
which  is  parallel  to  the  motion  direction.  Better  results  may  be  obtained  with 
an  improved  calibration  process,  a focused  camera,  higher  camera  resolution, 
better  robot,  wider  range  of  0,  largers  0’s  (in  particular  0 near  90°),  more 
sampled  images,  improved  edge  detection  methods  and  better  numerical 
integration  methods. 
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